Model overview and combinations, Dynamic memory networks. CS224n lecture 16.
Model overview and combinations
Model comparison
:
- Bag of Vectors: Surprisingly good baseline for simple text classification problems. Especially if followed by a few relu layers!
- Window Model: Good for single word classification for problems that do not need wide context, e.g. POS
- CNNs: good for classification, unclear how to incorporate phrase level annotation (can only take a single label), need zero padding for shorter phrases, hard to interpret, easy to parallelize on GPUs, can be very efficient and versatile
- Recurrent Neural Networks: Cognitively plausible (reading from left to right, keeping a state), not best for classification (n-gram), slower than CNNs, can do sequence tagging and classification, very active research, amazing with attention mechanisms
- TreeRNNs: Linguistically plausible, hard to parallelize, tree structures are discrete and harder to optimize, need a parser
- Combinations and extensions!
Rarely do we use the vanilla models as is.
TreeLSTMs
- LSTMs are great
- TreeRNNs can benefit from gates too ->TreeRNNs + LSTMs
- Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks by Kai Sheng Tai, Richard Socher, Christopher D. Manning
Quasi-Recurrent Neural Network
Neural Architecture Search(Google NAS)
- Manual process of finding best units requires a lot of expertise
- What if we could use AI to find the right architecture for any problem?
- Neural architecture search with reinforcement learning by Zoph and Le, 2016
Dynamic Memory Network
Architecture of DMN
左边输入input的每个句子每个单词的词向量,送入input module的GRU中。同样对于Question Module,也是一个GRU,两个GRU可以共享权值。
Question Module计算出一个Question Vector q,根据q应用attention机制,回顾input的不同时刻。根据attention强度的不同,忽略了一些input,而注意到另一些input。这些input进入Episodic Memory Module,注意到问题是关于足球位置的,那么所有与足球及位置的input被送入该模块。该模块每个隐藏状态输入Answer module,softmax得到答案序列。
Episodic Memory Module中有两条线,分别代表带着问题q第一次阅读input的记忆,以及带着问题q第二次阅读的记忆。
The Modules: Input
Further Improvement: BiGRU
The Modules: Question
$$
q_{t} = GRU(v_{t}, q_{t-1})
$$
The Modules: Episodic Memory
Gates are activated if sentence relevant to the question or memory:
If summary is insufficient to answer the question, repeat sequence over input.
The Modules: Answer
- $a_{t}$ : $h_{t}$
- $y_{t-1}$ : 上一时刻的输出